Missing Data Treatment Using Iterative Pca and Data Reconciliation
نویسندگان
چکیده
Two methods, one based on Iterative Principal Components Analysis(IPCA) and the other based on Data Reconciliation have been developed for estimating a model from a data matrix containing missing data. These algorithms are iterative in nature and analogous to the method based on PCA for treating missing data. The methods incorporate information about the measurement errors to develop the models and are optimal in a maximum likelihood sense. The close connection of the methods with the Expectation Maximization (EM) algorithm is also established. Simulated data from a Flow Network system with a variety of error structures and missing data is used to evaluate the performance of the proposed methods. In all cases, models estimated by the proposed methods were superior to those obtained by the classical PCA-based missing data treatment algorithms for nonuniform error.
منابع مشابه
On-Line Nonlinear Dynamic Data Reconciliation Using Extended Kalman Filtering: Application to a Distillation Column and a CSTR
Extended Kalman Filtering (EKF) is a nonlinear dynamic data reconciliation (NDDR) method. One of its main advantages is its suitability for on-line applications. This paper presents an on-line NDDR method using EKF. It is implemented for two case studies, temperature measurements of a distillation column and concentration measurements of a CSTR. In each time step, random numbers with zero m...
متن کاملHandling Missing Values with Regularized Iterative Multiple Correspondence Analysis
A common approach to deal with missing values in multivariate exploratory data analysis consists in minimizing the loss function over all non-missing elements. This can be achieved by EM-type algorithms where an iterative imputation of the missing values is performed during the estimation of the axes and components. This paper proposes such an algorithm, named iterative multiple correspondence ...
متن کاملDeconstructing Principal Component Analysis Using a Data Reconciliation Perspective
Data reconciliation (DR) and principal component analysis (PCA) are two popular data analysis techniques in process industries. Data reconciliation is used to obtain accurate and consistent estimates of variables and parameters from erroneous measurements. PCA is primarily used as a method for reducing the dimensionality of high dimensional data and as a preprocessing technique for denoising me...
متن کاملAvoiding Missing Data Biases in Phylogenomic Inference: An Empirical Study in the Landfowl (Aves: Galliformes).
Production of massive DNA sequence data sets is transforming phylogenetic inference, but best practices for analyzing such data sets are not well established. One uncertainty is robustness to missing data, particularly in coalescent frameworks. To understand the effects of increasing matrix size and loci at the cost of increasing missing data, we produced a 90 taxon, 2.2 megabase, 4,800 locus s...
متن کاملCombining a Logical and a Numerical Method for Data Reconciliation
The reference reconciliation problem consists in deciding whether different identifiers refer to the same data, i.e. correspond to the same real world entity. In this article we present a reference reconciliation approach which combines a logical method for reference reconciliation called L2R and a numerical one called N2R. This approach exploits the schema and data semantics, which is translat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004